Corpus-Driven Bilingual Lexicon Extraction
نویسنده
چکیده
This paper introduces some key aspects of machine translation in order to situate the role of the bilingual lexicon in transfer-based systems. It then discusses the data-driven approach to extracting bilingual knowledge automatically from bilingual texts, tracing the processes of alignment at different levels of granularity. The paper concludes with some suggestions for future work. 1 Machine Translation
منابع مشابه
Improving Corpus Comparability for Bilingual Lexicon Extraction from Comparable Corpora
Previous work on bilingual lexicon extraction from comparable corpora aimed at finding a good representation for the usage patterns of source and target words and at comparing these patterns efficiently. In this paper, we try to work it out in another way: improving the quality of the comparable corpus from which the bilingual lexicon has to be extracted. To do so, we propose a measure of compa...
متن کاملExtraction de lexiques bilingues à partir de Wikipédia (Bilingual lexicon extraction from Wikipedia) [in French]
________________________________________________________________________________________________________ Bilingual lexicon extraction from Wikipedia With the increased interest of the machine translation, needs of multilingual resources such as comparable corpora and bilingual lexicon has increased. These resources are not available mainly for pair of languages that do not involve English. This...
متن کاملTowards a Generic Approach for Bilingual Lexicon Extraction from Comparable Corpora
This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the problem associated to polysemous words found in the seed bilingual lexicon when translating source context vectors. To improve the adequacy of context vectors, the use of a WordNetbased Word Sense Disambiguation process is tested. Experimental results...
متن کاملBilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora
This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...
متن کاملContext Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
This paper presents an approach that extends the standard approach used for bilingual lexicon extraction from comparable corpora. We focus on the unresolved problem of polysemous words revealed by the bilingual dictionary and introduce a use of a Word Sense Disambiguation process that aims at improving the adequacy of context vectors. On two specialized FrenchEnglish comparable corpora, empiric...
متن کامل